Allophone-based acoustic modeling for Persian phoneme recognition
Authors
Abstract:
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects of speech context, and using the context-dependent models in phoneme recognition is a method which used to compensate the negative effects of coarticulation. According to this method, if two similar phonemes in speech have different contexts, each of them constitute a separate model. In this research, a linguistic method called allophonic modeling has been used to model context effects in Persian phoneme recognition. For this purpose, in the first phase, the rules required for occurrence of various allophones of each phoneme, are extracted from Persian linguistic resources. So each phoneme is considered as a class, consisting of its various context-dependent forms named allophones. The necessary prerequisites for modeling and identifying allophones, is an allophonic corpus. Since there was no such corpus in Persian language, SMALL FARSDAT corpus has been used. This corpus is segmented and labelled manually for each sentence, word and phoneme. So the phonological and lingual context required for the realization of allophones, is implemented in this corpus. For example, the syllabification has been performed on the corpus and then, for each phoneme, its position (first, middle and end) in the word and syllable is specified using different numeric tags. In the next step, allophonic labeling has been performed by searching for each of the allophonic contexts in the corpus. These allophonic corpus is used to model and recognize the allophones of input speech. Finally, each allophone is assigned to a proper phonemic class so phoneme recognition has been done using allophones. The experimental results show a high accuracy of the proposed method in phenome recognition, indicating a significant improvement comparing with other state-of-the-art methods.
similar resources
Phoneme recognition using acoustic events
This paper presents a new approach to phoneme recognition using nonsequential sub{phoneme units. These units are called acoustic events and are phonologically meaningful as well as recognizable from speech signals. Acoustic events form a phonologically incomplete representation as compared to distinctive features. This problem may partly be overcome by incorporating phonological constraints. Cu...
full textAutomatic phoneme alignment based on acoustic-phonetic modeling
This paper presents a method for speaker-independent automatic phonetic alignment that is distinguished from standard HMM-based “forced alignment” in three respects: (1) specific acoustic-phonetic features are used, in addition to PLP features, by the phonetic classifier; (2) the units of classification consist of distinctive phonetic features instead of phonemes; and (3) observation probabilit...
full textPersian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
full texta soft segment modeling approach for duration modeling in phoneme recognition systems
the geometric distribution of states duration is one of the main performance limiting assumptions of hidden markov modeling of speech signals. stochastic segment models, generally, and segmental hmm, specifically, overcome this deficiency partly at the cost of more complexity in both training and recognition phases. in this paper, a new duration modeling approach is presented. the main idea of ...
full textConversion from phoneme based to grapheme based acoustic models for speech recognition
This paper focuses on acoustic modeling in speech recognition. A novel approach how to build grapheme based acoustic models with conversion from existing phoneme based acoustic models is proposed. The grapheme based acoustic models are created as weighted sum from monophone acoustic models. The influence of particular monophone is determined with the phoneme to grapheme confusion matrix. Furthe...
full textWavelet based feature extraction for phoneme recognition
In an effort to provide a more efficient representation of the acoustical speech signal in the pre-classification stage of a speech recognition system, we consider the application of the Best-Basis Algorithm of Coifman and Wickerhauser. This combines the advantages of using a smooth, compactly-supported wavelet basis with an adaptive time-scale analysis dependent on the problem at hand. We star...
full textMy Resources
Journal title
volume 17 issue 3
pages 37- 54
publication date 2020-11
By following a journal you will be notified via email when a new issue of this journal is published.
No Keywords
Hosted on Doprax cloud platform doprax.com
copyright © 2015-2023